-
Notifications
You must be signed in to change notification settings - Fork 47
[V2] Changes to language definition #1516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: teofr/node_checker
Are you sure you want to change the base?
Conversation
|
OmarTawfik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions/suggestions. Thanks!
| - `HexLiteral` and `YulHexLiteral` and `DecimalLiteral` and `YulDecimalLiteral`: | ||
| - It was illegal for them to be followed by `IdentifierStart`. Now we will produce two separate tokens rather than rejecting it. | ||
|
|
||
| ## Language Definition Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming this section to Grammar, since the rest of the doc also lists language definition changes:
## Grammar|
|
||
| ### IdentifierPath | ||
|
|
||
| Changed from a simple `Separated` list to a structured format to allow the reserved `address` keyword to appear in identifier paths (but not as the head): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we are able to just use Separated(MemberAccessIdentifier, Period) for simplicity? We won't need the extra type, given how commmon IdentifierPath is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, WDYT of IdentifierPathElement instead of MemberAccessIdentifier? the latter conflicts with the fact that one of its two variants is no longer an identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that'd be cleaner. Potentially that could introduce some ambiguity, but probably solvable by writing the parser rule by hand. I'll go down that way
| The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a | ||
| an `AssignmentExpression` with a `TupleExpression` on the lhs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can still be a
TupleDeconstructionStatementor a anAssignmentExpression
Which one? I wonder if we have existing cst_output tests for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how about var (,,,,)? this is legal AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one?
This is a tricky question since there's still no parser on V2, I'll try to answer it:
- Right now the v2 language definition (with these changes) is still ambiguous, so in theory parsing either one of those options is correct.
- If we choose to do like the V1 and give priority to definitions higher up the
definition.rsfile, then they'd be parsed asTupleDeconstructionStatement - Once the V2 parser is done, we need to handle this ambiguity and choose one, I'd go for those cases being an
AssignmentExpression, since they're not declaring a variable at all. - Right now there's no way to express "a separated item, where every item is optional but has to appear at least once" in the language definition DSL, so it makes it difficult to express this within the language DSL
- We could separate it into a prefix of empty tuple (ie
(,,,,), then an element that must be there (iebool a), and then a postfix of possible empty tuple elements (, bool b, , ,)); but that would make a parsing problem make the CST and general API worse.
- We could separate it into a prefix of empty tuple (ie
- Also, I just checked
solcdoesn't seem to allow empty tuples at all (ie(,,,) = ...or() = ...) after0.5.0, so maybe we need to validate this after parsing.
I wonder if we have existing cst_output tests for this case?
We have some cases, I added a few more (they're only testing V1 for now)
Also, how about
var (,,,,)? this is legal AFAICT.
This is legal with the new definition as well, since the elements of the Separated (UntypedTupleDeconstructionElement) have an optional identifier as its field:
Struct(
name = UntypedTupleDeconstructionElement,
enabled = Till("0.5.0"),
fields = (name = Optional(reference = Identifier))
),|
|
||
| This makes certain cases that were allowed before disallowed in V2, in particular having untyped declarations (like `(a, bool b) = ...`) | ||
| or having typed together with `var` (like `var (a, bool b) = ...`). | ||
| The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For another PR/discussion, we considered merging
TupleDeconstructionStatementandVariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different.
I think having this distinction (declaring new vars VS assigning to existing ones) is worth the destinction, both syntactically and semantically. WDYT of having the following structure, if it works with LALR?
- use
VariableDeclarationStatementfor any syntax that declares a new name:var x = ...already supportedint x = ...already supported- change
VariableDeclarationStatement::namefield to an enum with two variants:name: Identifier-> existingelements-> a struct holdingLeftParen+Separated(elements)+RightParen
- use
AssignmentExpressionfor any syntax that just assigns values to the LHS:x = ....(x, y) = ....(,,,) = ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this distinction (...) is worth the destinction
I completely agree
The problem I see with the proposed structure is that currently the int and the var in the first 2 cases are captured by the same definition, so you could end up with a language accepting something like int (a, b, c) = ....
But also, since you want to allow for the elements of the tuple to have the type within it, you'd maybe want to make the var/int struct optional, to allow (bool a, uint b) = ..., but that would also parse x = ... as valid.
Coming from a perspective of appeasing LALRPOP, I'd say the distinction has to be a bit stronger, so allowing VariableDeclarationStatement to be an enum over:
SingleExplicitDeclaration:int x = ...MultiExplicitDeclaration:(bool a, , int b) = ...ImplicitDeclaration(until0.5.0):var a = ...andvar (a, , b) = ...(so this one would have the enum allowing either a singleIdentifieror a tuple ofIdentifier)
This could be simplified when lowering it to an AST
What do you think? I'll try to push a single commit with these changes so you can review them.
d95ed0d to
db16409
Compare
fb9d550 to
c14dcbf
Compare
Changes to V2 language definition, the main reason is to facilitate creating an LR(1) parser. The more complex ones are:
TupleDeconstructionStatement, making it more strict and with a clear separation betweenvarstyle declarations and explicit ones.IdentifierPath, due to makingaddressa reserved keyword.For another PR/discussion, we considered merging
TupleDeconstructionStatementandVariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different. We can still force it if we consider there's value in it, but I think not worth it for now; they'll probably be joined in one of the passes simplifying the ast.